-- card: 2822 from stack: in.5 -- bmap block id: 2478 -- flags: 4000 -- background id: 2153 -- name: TEXIntroCard -- part 67 (button) -- low flags: 00 -- high flags: A003 -- rect: left=401 top=215 right=237 bottom=510 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 65535 -- font id: 3 -- text size: 10 -- style flags: 0 -- line height: 13 -- part name: Append Files ----- HyperTalk script ----- on mouseUp answer "Select a base file, then others to append to it." answer "Choose 'Cancel' when finished appending." get appendDeleteFiles ("APPEND") if It is not empty then answer It end mouseUp -- part 70 (button) -- low flags: 00 -- high flags: A003 -- rect: left=401 top=189 right=211 bottom=510 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 65535 -- font id: 3 -- text size: 10 -- style flags: 0 -- line height: 13 -- part name: Build Index ----- HyperTalk script ----- on mouseUp answer "Must close any open dataspaces first..." with "Cancel" or "OK" if It is "Cancel" then exit mouseUp closeAnyOpenDataspaces get indexTEXfile() if char 1 of It is "{" then answer It end mouseUp -- part 75 (button) -- low flags: 00 -- high flags: A003 -- rect: left=401 top=241 right=263 bottom=510 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 65535 -- font id: 3 -- text size: 10 -- style flags: 0 -- line height: 13 -- part name: Delete Files ----- HyperTalk script ----- on mouseUp answer "Select files to be deleted -- beware!" get appendDeleteFiles ("DELETE") if It is not empty then answer It end mouseUp -- part 96 (button) -- low flags: 00 -- high flags: 0000 -- rect: left=476 top=313 right=342 bottom=512 -- title width / last selected line: 0 -- icon id / first selected line: 1011 / 1011 -- text alignment: 1 -- font id: 0 -- text size: 12 -- style flags: 0 -- line height: 16 -- part name: Home ----- HyperTalk script ----- on mouseUp go home end mouseUp -- part 97 (field) -- low flags: 01 -- high flags: 0001 -- rect: left=29 top=167 right=181 bottom=279 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 1 -- font id: 3 -- text size: 9 -- style flags: 16384 -- line height: 12 -- part name: versionField -- part 100 (button) -- low flags: 00 -- high flags: A003 -- rect: left=401 top=267 right=289 bottom=510 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 65535 -- font id: 3 -- text size: 10 -- style flags: 0 -- line height: 13 -- part name: Open Dataspace ----- HyperTalk script ----- on mouseUp send mouseUp to button "Browse Dataspace" send mouseUp to background button "Open Dataspace" end mouseUp -- part 101 (button) -- low flags: 00 -- high flags: 0000 -- rect: left=382 top=300 right=342 bottom=424 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 65535 -- font id: 3 -- text size: 10 -- style flags: 0 -- line height: 13 -- part name: Browse Dataspace ----- HyperTalk script ----- on mouseUp visual effect iris open to black visual effect iris open go to next card end mouseUp -- part 102 (button) -- low flags: 00 -- high flags: A003 -- rect: left=401 top=41 right=63 bottom=510 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 65535 -- font id: 3 -- text size: 10 -- style flags: 0 -- line height: 13 -- part name: Introduction ----- HyperTalk script ----- on mouseUp if the visible of card field introField is true then hide card field introField else show card field introField end if end mouseUp -- part 108 (field) -- low flags: 01 -- high flags: 0001 -- rect: left=29 top=193 right=207 bottom=279 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 1 -- font id: 3 -- text size: 9 -- style flags: 16384 -- line height: 12 -- part name: andReasField -- part 103 (field) -- low flags: 81 -- high flags: 0004 -- rect: left=6 top=19 right=341 bottom=390 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 0 -- font id: 2 -- text size: 12 -- style flags: 0 -- line height: 16 -- part name: introField ----- HyperTalk script ----- on mouseUp hide card field introField end mouseUp -- part 104 (button) -- low flags: 00 -- high flags: A003 -- rect: left=401 top=69 right=91 bottom=510 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 65535 -- font id: 3 -- text size: 10 -- style flags: 0 -- line height: 13 -- part name: Help ----- HyperTalk script ----- on mouseUp if the visible of card field helpField is true then hide card field helpField else show card field helpField end if end mouseUp -- part 106 (button) -- low flags: 00 -- high flags: A003 -- rect: left=401 top=97 right=119 bottom=510 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 65535 -- font id: 3 -- text size: 10 -- style flags: 0 -- line height: 13 -- part name: Author's Remarks ----- HyperTalk script ----- on mouseUp if the visible of card field authorField is true then hide card field authorField else show card field authorField end if end mouseUp -- part 109 (button) -- low flags: 00 -- high flags: A003 -- rect: left=401 top=125 right=147 bottom=510 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 65535 -- font id: 3 -- text size: 10 -- style flags: 0 -- line height: 13 -- part name: Technical Notes ----- HyperTalk script ----- on mouseUp if the visible of card field techField is true then hide card field techField else show card field techField end if end mouseUp -- part 105 (field) -- low flags: 81 -- high flags: 2007 -- rect: left=3 top=19 right=340 bottom=462 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 0 -- font id: 2 -- text size: 12 -- style flags: 0 -- line height: 16 -- part name: helpField ----- HyperTalk script ----- on mouseUp hide card field helpField end mouseUp -- part 110 (field) -- low flags: 81 -- high flags: 2007 -- rect: left=3 top=19 right=340 bottom=497 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 0 -- font id: 4 -- text size: 9 -- style flags: 0 -- line height: 12 -- part name: techField ----- HyperTalk script ----- on mouseUp hide card field techfield end mouseUp -- part 107 (field) -- low flags: 81 -- high flags: 2004 -- rect: left=1 top=21 right=341 bottom=512 -- title width / last selected line: 0 -- icon id / first selected line: 0 / 0 -- text alignment: 0 -- font id: 3 -- text size: 9 -- style flags: 0 -- line height: 12 -- part name: authorfield ----- HyperTalk script ----- on mouseUp hide card field "authorField" end mouseUp -- part contents for card part 97 ----- text ----- version 0.5 - by Mark ^Zimmermann -- part contents for card part 103 ----- text ----- Welcome to TEX! — version 0.5 — 19880904 ©1988 Mark ^Zimmermann — all rights reserved ———————————————————————————————————————— To get started using TEX quickly: • find or make a TEXT file — the bigger the better!* • click the "Build Index" button to index it... • click the "Open Dataspace" button and select your text file to see two windows into its index... • scroll around and click on a word in the index window to see a key-word-in-context display for that word... • scroll around in the context display and click on a line to jump into the full text of your file at that location... • experiment with the buttons to move around in your dataspace, browsing and gathering information... • read the Help text to learn about advanced techniques such as subspace navigation and multi-dataspace browsing... • mail in license fee & repeat steps above! ———————————————————————————————————————— *You must have enough free disk space to build the index; about twice the size of the original file is usually ample. -- part contents for card part 105 ----- text ----- Welcome to TEX! — version 0.5 — 19880904 Copyright ©1988 Mark ^Zimmermann — all rights reserved (see below for license fee information) ———————————————————————————————————————————————Help is organized as follows: • ABOUT THIS STACK • BACKGROUND - License fees and distribution - History & Acknowlegements • HOW TO USE TEX - Building an Index - Elementary Browsing - Advanced Techniques & Concepts • IN CASE OF TROUBLE • SOME TECHNICAL NOTES • FINAL REMARKS ——————————————————————————————————————————————— (Help is a single, simple text field, so you if you wish you may print it out for reading and reference at your leisure.) Standard Legalistic Disclaimer: I have never lost any data using TEX or its related programs, but prudence dictates that you should always back up your files and work with copies. I cannot be responsible for ill effects that follow, directly or indirectly, from the use of my software. There are no guarantees in life. All other disclaimers are included herewith by reference. Use TEX at your own risk! ——————————————————ABOUT THIS STACK———————————————— I want omniscience. Until then, I build tools for people to use on massive collections of free-text information. TEX is the latest of those tools. All conventional database systems (that I know of) fall short in one or more ways: • they require “clean” input data, in highly structured formats; • they break down if applied to files larger than a few megabytes; • they are intolerably slow in answering simple queries; • they do not allow easy, interactive free-association and browsing; • they are not integrated with writing or programming tools; • they demand too much work to get data into the system; • they lack a user interface fit for an intelligent being; • they cost too much, or only run on expensive/exotic hardware. Real life is not clean! Data doesn't arrive in neat little zones and fields pre-marked with delimiters — it comes as an overwhelming flood of information from diverse sources. TEX lets you handle that kind of textual data. You can take technical manuals, religious texts, archives of downloaded information, school notes, or any other files that you may have, as long as the files contain “words” which are separated by spaces or punctuation. I call a big file of free text a “Dataspace”. TEX takes a dataspace and builds machine-readable indices to every word. Then, TEX lets you browse that dataspace, move around in it, gather pieces of information together, and discover new things that were hidden by the masses of text — gems or nuggets of valuable data. The computer does what it does best: meticulous organization of information and rapid execution of instructions. You do what you do best: thinking, pattern-recognition, intuition. Symbiosis/synergy! ————————————————————BACKGROUND—————————————————— ---------------License Fees & Distribution--------------- Please see my short essay (click the button "Author's Remarks") for my views on the morality of overpriced software. In order to continue developing TEX, I need your help! ********************************************************************** Please redistribute this stack freely, keeping my words intact! Individual (or family) users of TEX should send a $10 license fee to me at the below address. The corporate license fee is $40 per copy in simultaneous use (reduced to $10/copy for nonprofit or educational users). If you sell this stack or use its XFCNs in a commercial product, send 2% of the retail list price per copy sold. ********************************************************************** Send payments to: Mark ^Zimmermann 9511 Gwyndale Drive Silver Spring, MD 20910 USA telephone: 301-565-2166 CompuServe: [75066,2044] arpanet: science@NEMS.ARPA In exchange for your lifetime license fee, you get: - aperiodic information about enhancements and extensions; - free support and advice on the use of TEX; - copies of new versions for a nominal charge ($5+SASE+disk); - a nice warm feeling knowing that you are supporting further research into massive free-text dataspace tool-building. Be a part of the adventure! In addition to sending in your license fee, please take the time to write me a letter about how you're using TEX, or send a disk with your ideas, suggestions, or modifications to the stack. All monies received are used to enhance this software and to pay for creation and distribution expenses. Comparable "commercial" free-text database products cost hundreds or thousands of dollars. You have something precious here — take advantage of it, please! Build upon my work, extend it, and share your results with the community of amateurs and scholars. I am very proud of this software. I have chosen to distribute it at an extremely low cost, so that it can be more widely used and can help more people. Please join me! As mentioned above, if you wish to use the external functions from TEX in a commercial product, the current license fee is 2% of the retail list price per copy distributed. Contact me for details, please. (That's a couple of orders of magnitude less than competing indexing products, I might add.) If you prefer to take the public-domain XFCNs from TEXAS and work from those, then no royalties or licensing fees apply, of course. I do not want to be in the business of copying and mailing disks — it takes far too much time away from programming! All of the programs mentioned here have been widely posted in public fora and are being distributed by major users groups (such as BMUG in Berkeley). But if you absolutely cannot find the current version of TEX or any of the other programs, you may (sigh) contact me. Send $15, along with a self-addressed stamped envelope and a formatted 800kB Macintosh disk, to the above address. I'll send you the latest versions of TEX, TEXAS, MultIndexer, qndxr.c, brwsr.c, and the source code in C to them (or as much as will fit on one disk, anyway). The $15 disk copying fee includes the $10 individual/family license fee for TEX, so if you've already registered and paid your $10, you may deduct that amount; if you're requesting a disk on behalf of a for-profit commercial enterprise you should add another $30 for the first-copy corporate license fee. I'm sorry, but I cannot promise to respond to correspondence which does not include a self-addressed stamped envelope for my reply. (The only exceptions are letters with foreign postmarks, written in sincere but broken English.) ------------History & Acknowlegements------------ TEX is a conceptual descendent of TEXAS, an indexer/browser system which I developed in 1987. TEXAS is a free, public-domain HyperCard stack, available on many public bulletin boards and computer networks. In turn, TEXAS itself is a descendent of earlier free-text browser programs which I wrote in MacForth during 1985-1986. During the summer of 1988 I took the publically-available source code of TEXAS and completely rewrote and restructured it. The result is the stack you now have — TEX. For help with the development of TEX, I thank many people: • my wife, Paulette Dickerson, and my children, Merle, Gray, and Robin; • Andreas Vichr, who did most of the graphics and design work (if you want to find his monument, look around you!); • my teachers (formal and informal) over the years; • the developers of the Apple Macintosh, of HyperCard, of MacForth, and of Think's Lightspeed C; • the enthusiastic users of earlier dataspace browsing programs — especially Andy Jewell, Sam Thornton, Joe Golton, and many others whose names I have misplaced (I'm only sorry that I haven't been able to incorporate more of your suggestions yet!) — and those who made voluntary contributions to encourage further development; • Vernor Vinge (author of TRUE NAMES), and other science-fiction writers who have thought about future human-machine interactions; • the scholars, researchers, and amateurs of the world, who have chosen to share their discoveries rather than attempt to achieve short-term profit by them. (Again, see my essay under "Author's Remarks".) —————————————————HOW TO USE TEX——————————————————— --------Building a Dataspace -------- Before you can use TEX to navigate through a dataspace, you have to build that dataspace from your original text file. Index-building is a very simple process: just click the “Build Index” button on the TEX “Menu” card, select the text file to index from the standard Macintosh files dialog, and then sit back and relax! There is no need to reformat or pre-process your information, no need to arrange items into rows or columns, fields or zones, and no need to read through the text adding links or pointers by hand. The computer does the work, not you. Every word processor has the ability to save files as standard ASCII text (sometimes called “Text Only”). For speed and simplicity, TEX does not work with special or proprietary file formats. For your convenience, the Menu card of TEX provides buttons to join text files together (“Append Files”), and to erase files of any type (“Delete Files”). You may wish to use them to build up large dataspaces and to remove obsolete files. Indexing is an investment — it costs a little time and disk space, but you get in exchange the ability to browse and correlate information throughout your dataspace in real time. Specifically, a TEX index consists of two files: one file of “keys” (which are the distinct words and their occurrence counts in your dataspace), and the other file of “pointers” (which indicate where to find every instance of every word). The index files have the same name as your original text file but with “.k” or “.p” appended. They must be kept in the same folder on the Macintosh desktop as your text file, so that TEX can find and use them. TEX index-building is *fast*! On a standard Macintosh Plus with a slow hard disk drive, I typically find that indexing proceeds at about 3 megabytes/hour. On a Mac II, I've measured speeds of 12-16 MB/h. In one experiment, on a Sun Workstation (using a TEX-compatible indexing program), I got over 50 MB/h. Of course, your mileage may vary.... The TEX index files are optimized for speed in building, browsing and retrieving, but they are fairly space-efficient too. Every distinct word in your input text file requires 32 bytes in the TEX keys file, and each occurrence of a word uses four bytes in the TEX pointers file. Thus, for small files where there are many different words, the .k and .p files often add up to 150% or so of the size of the text file. But as files get larger, efficiency increases since there are fewer new words showing up per megabyte. In my experience, for files in the 5-50 MB range, the index structures add about 80% overhead to the original text file. That's pretty good compared to many other database systems. During TEX index-building, you need to have room on your disk for a couple of extra copies of your original text file. That space is used to hold temporary/partial index files which are automatically erased when indexing is finished. If you try to build an index without having enough free space on your disk to hold the temporary files, you are likely to have a severe system crash! (You shouldn't lose any data, however — see the section IN CASE OF TROUBLE for information on how to recover.) Also during indexing, you need to be sure that you do not have any pre-existing files around with names the same as your dataspace file plus “.k” or “.p” at the end. If you have such files, the TEX indexer will not be able to name its final index files properly. TEX was designed for large files — in anticipation of optical storage and cheaper big hard disks in the near future. Its current upper limit is set by the use of four-byte pointers, which restricts a dataspace to a few thousand megabytes. But that limit can be lifted by a bit of reprogramming when the need arises in a few years. There are several ways to create an index. First, you can click the “Build Index” button in the TEX stack itself, as discussed above. Alternatively, you can run a generic indexer such as my “qndxr.c” stand-alone application. It indexes somewhat faster than TEX does, since without HyperCard in memory there is more space for text buffers and pointer arrays. The qndxr.c program also provides some additional options, such as the ability to retain embedded punctuation in an index (so that words like “qndxr.c” are indexed as a single entry, instead of as “qndxr” and “c” separately). Variants of qndxr.c run on Suns, VAXen, etc., if you have access to one of them and can pass files back and forth easily. On the other hand, qndxr.c is a traditional command-line oriented program without a civilized user interface, and has various other limitations that make it unpleasant to use at times. A third approach is to build an index in background under MultiFinder on the Macintosh. My program “MultIndexer” lets you do just that. It has the unæsthetic command-line UNIX-like interface that qndxr.c suffers from, but once you get it running, you can jump into another memory partition and keep working while the indexing proceeds for you. MultIndexer is usually a bit slower than the other index-building programs, since it doesn't get as much processor time in background and has to work in a smaller memory space. (MultIndexer can run in as little as 160 kB.) If you need to do something else while a large file is being indexed, MultIndexer may be the solution. Both qndxr.c and MultIndexer are free, public-domain programs; I include them and their source code (in C) on my standard TEX software disk. As mentioned elsewhere, send $15 along with a self-addressed stamped envelope and a formatted 800kB Mac disk to get them. They have been posted on many public bulletin boards and networks also, so you may want to check the archives there. As of today, the current version of qndxr.c is 0.4, and of MultIndexer is 0.41. I plan no further enhancements of either of them. -------------Elementary Browsing-------------- The TEX user interface consists of a very intuitive, “what you see is what you get” set of windows into various views of a dataspace. There are three levels of dataspace browsing: Index, Context, and Text. In a loose way, the three levels are analogous to one-dimensional, two-dimensional, and three-dimensional geometric spaces. Begin with the Index display. (If you are running TEX, open a dataspace now and you'll immediately see the Index windows; if you've been browsing around, just click on the 'Index' button to return there.) The TEX Index view gives you two independently-scrolling windows into an alphabetized list of words — every word in your dataspace, each with a count of how many times it appears. For instance, my index of the King James translation of the Bible begins with: 8277 A 335 AARON 1 ABADDON (The word “A” appears 8,277 times, “ABADDON” only once, etc.) Scroll around in the Index windows and see the words and their occurrence counts for your dataspace. (The things which look like scroll bars are actually buttons, but they work as your intuition suggests they should.) Click on the ^Z thumb tab (elevator) button and type in a target word in order to jump long distances in the index. When you find a word that seems interesting, just click on the word itself in the Index window. TEX immediately goes to the disk and assembles for you a “key-word-in-context” display, which I call the Context view. It consists of the actual occurrences of your word, centered in a window, with half a line of context on each side. Thus, for example, if we clicked on the word “ABARIM” in the index view, we get: into this mount Abarim, and see the land which I have given u he mountains of Abarim, before Nebo. NUMBERS 33:48 And th he mountains of Abarim, and pitched in the plains of Moab by to this mountain Abarim, {unto} mount Nebo, which {is} in the (It looks better with a monospaced font like Monaco, as in the actual Context window.) The four instances of “ABARIM” are lined up, and we can see enough characters on either side to recognize if further browsing is worthwhile. The Context view has “scroll bars” like the Index windows had, so you can scroll around freely to see all the occurrences of any of the words in your dataspace. As for the Index view, the ^Z button lets you jump greater distances than are convenient to scroll. If at any time you want to go back to the Index view, just click the Index button at the bottom right of the screen. When you locate a promising entry in the Context view, simply click anywhere on that line. TEX fetches the actual text of your dataspace file surrounding that entry and puts it into a standard text field, so you can scroll around and read it, copy out sections for your notes, etc. TEX uses the standard HyperCard "Find" command to locate and put a box around the text you've requested. If, while scrolling around in the Text view, you find that you want to see more information from above or below the extracted selection of text, click a button (“Move Up in Text” or “Move Down in Text”) to retrieve it. If you see a word in the Text window that looks promising for further research, highlight it with the mouse and click on the “Find Selection” button. TEX will locate your word in the Index view and automatically return you there. This makes it possible to do rapid, interactive browsing and creative discovery with a minimum of typing. For your convenience in taking notes on what you read, TEX provides a “Notes” button which takes you to a card with a scrolling text field. You may want to accumulate clippings from the text view there, write comments on what you've read, etc. When you are finished using TEX and leave the stack, you will be asked “Close any open dataspace(s)?”. For now, always answer the default, “Yes”; exceptions will be discussed in the next section. That completes the survey of “elementary” browsing techniques that TEX supports. The earlier public-domain system, TEXAS, had all of those capabilities, though not as nicely implemented in some cases. If that were all that TEX had to offer, it would be nice but not a significant step forward. -----------Advanced Techniques & Concepts------------ The simple Index/Context/Text views and retrieval tools discussed above are sufficient for many purposes. But when you have a very large dataspace, or more than one large dataspace to work with, you may want to apply some more advanced techniques. TEX provides several, and also gives you the opportunity to develop and customize the standard TEX stack to meet your specific needs. •Subspace browsing: The first and most important new feature of TEX is the “Subspace” button. If you're browsing in a dataspace, click Subspace now. You'll return to the Index view, and will see the display of word counts change. You've gone into a subspace of the whole dataspace, and now the count information tells you how many occurrences of each word are in your subspace. Subspace browsing is a bit like proximity searching in a more conventional database program — but it's far more convenient and intuitive to use. You define a subspace by clicking on the count columns in an Index window. For a simple example, suppose you want to look for data about Apple Computer. But the words “APPLE” and “COMPUTER” occur too many times in your dataspace to conveniently browse in the usual Context view. So, you go into a subspace by clicking the Subspace button. You scroll or jump to the word “APPLE” and click on the count column; it shows ~0% before you click, and ~100% afterwards. Now when you scroll to the word “COMPUTER”, you'll see that only some fraction, perhaps ~3%, of the occurrences of “COMPUTER” were in the neighborhood of “APPLE”. Click on the word “COMPUTER” itself (not the count column) to see those instances of “COMPUTER” that are in your subspace. When you're using a subspace, the word count information is shown exactly for words that don't occur too many times. For instance, “23/92 FNORD” means that 23 out of a total of 92 instances of the word “FNORD” are in your subspace. Words that occur more often than a chosen threshhold are shown as an approximate percentage, as in the above example with “COMPUTER”. Subspace browsing can be much more sophisticated than than our first simple example showed. You can add or subtract the neighborhoods of words in various orders, so that your subspace holds a complex combination of terms. For instance, it's easy with a few clicks to define a subspace consisting of the words in the neighborhoods of “HYPERCARD”, “HYPERTALK”, or “STACK” but not in the vicinity of “XFCN” or “XCMD”. The mechanics of subspace definition are straightforward and become intuitive in a short time. When you click on the Subspace button, if you don't have a currently-active subspace then a new, empty one is created for you; if you have a subspace, it is discarded and you're back in the full dataspace. Once you're using a subspace, every click on the count column of an Index window adds or subtracts a word from the subspace. If you click on a count that is zero, then that word's neighborhood is added to the subspace and the count goes up to 100% (or N/N). If you click on a word that has a nonzero count, then that word's neighborhood is subtracted from the dataspace and its count goes to zero. If you want to change the definition of a word's neighborhood, just click on one of the radio buttons marked “Words”, “Sentences”, or “Paragraphs”, which appear when a subspace is active. If “Words” is selected, clicking on a count adds a neighborhood of a few words on each side of that item to the subspace; “Sentences” broadens the definition of a neighborhood to be a few sentences on each side of the chosen term; and “Paragraphs” does what it implies and further enlarges the neighborhood. Once you've defined a subspace by clicking on the count column of the Index windows, a click on a word takes you to the Context view — but with a difference. This time, only those words that are members of your subspace are shown on a line with contextual information around them. If you've defined a sparsely-populated subspace, there may be lines in the Context view of dots, “...........”, showing places where 100 words were skipped without finding one that fell within your subspace. A click on a non-empty line of the Context view takes you, as usual, immediately into the full Text view of your dataspace. The text displayed is complete and is not restricted by your choice of subspace. •Multi-dataspace browsing: If your information comes in distinct categories and you don't wish to simply merge all of it into a single overarching dataspace, you may want to open multiple dataspaces for simultaneous browsing. TEX lets you do that easily. All of the TEX database browsing services are contained on a single card. So, to open multiple dataspaces, just copy that card and then paste it (as many times as you like) into the TEX stack! (Be sure to do it when no dataspaces are open, or you may confuse the system.) You may also want to add buttons such as “Next” or “Prev” and put distinguishing features on each of your browsing cards, to avoid confusion. The Macintosh operating system imposes a limit on how many dataspaces you can open at one time. Each open dataspace uses three file paths (to the text, keys, and pointers files), and depending on the configuration and version of the system that you are using the limit on available dataspaces may vary. I have browsed six dataspaces simultaneously, but have not tested the limits of this feature thoroughly and would appreciate any reports of your own experiences. (You should probably let TEX close all open dataspaces before trying to build new index files, as parts of that operation require opening up to ten file paths. Be careful about appending files, calling up desk accessories, or doing other I/O-intensive tasks while browsing many dataspaces at once!) •Leaving with open dataspaces: When you leave the TEX stack, as mentioned earlier you are asked, “Close any open dataspace(s)?”. In general, you should answer “Yes”, and allow TEX to close the links to files and release the subspaces you have created. (You can close a dataspace by hand simply by clicking on the "Open Dataspace" button and then choosing "Cancel" instead of a file name.) But if you just want to go within HyperCard to another stack, and then return to TEX for further browsing, it's usually all right to keep the files and subspaces open for activity. You may even be able to quit HyperCard and then return to an open dataspace, although that's getting dangerous. Any subspace that you may have defined will be invalid, and deadly consequences are possible. You also won't be able to discard or otherwise manipulate an open dataspace file while outside of TEX. On the other hand, if you are truly just going to leave TEX for a few moments and then return to continue work, it's convenient to leave dataspaces open. And if you plan to shut down operations and turn off your Macintosh immediately, it also isn't necessary to take the time to close all dataspaces; the operating system will do that for you. But in general, it is safest to let TEX close any open dataspaces for you whenever you leave the stack. ————————————————IN CASE OF TROUBLE————————————————— TEX is designed to keep you out of trouble, but sometimes the unexpected happens. If you get stuck, particularly with a message about a “...fatal error...” of some sort, you should probably quit from HyperCard, reboot your Macintosh, and then look around. Did you run out of disk space while building an index? Did you try to reindex a changed text file without first discarding the old indices? Do you have anything unusual about your system that might have caused a bad interaction with TEX? After a bad crash, especially while index-building, you may be left with a large number of temporary index files on your disk, with names such as “z0k0”, “z2p17”, etc. It may take the Finder a long time to even open the folder where your index-building was taking place. Be patient, and trash all such files before trying to build any further indices. (You can use the TEX “Delete Files” button to get rid of them if you like.) Excluding the two buttons labeled “Delete Files” and “Append Files”, no TEX operations should ever change the contents of any of your files. Let me know at once of any apparent exceptions, please! If you get a beep and a warning of a non-fatal error, you can probably continue working, but think first. Did you try to use an advanced feature without reading the instructions above? (That's OK, as long as you don't make a mistake!) Did you attempt to open a dataspace text file which hasn't been indexed, or which has been changed since indexing last took place? Did you try to build an index to a text file which already had old keys and pointers files on disk? (You will find some new index files with temporary names such as “z3k0” and “z3p0”; throw away the old .k and .p files and rename the new ones appropriately.) Did you scroll off one end of the dataspace? Were you pushing the envelope in some other direction? TEX attempts to catch any errors you may make, but I would greatly appreciate being notified of any other “safety nets” which I should incorporate in the next version of the system. ——————————————SOME TECHNICAL NOTES———————————————— Sorry — I've run out of room in this HyperCard field to go into the gory details of how TEX works! For more information, see the instructions for ordering a disk with all the C source code itself. The C routines are well commented and not too hard to understand, and they provide the best documentation of what is going on. Or click the button “Technical Notes” to get a brief review of the TEX XFCN calling conventions and parameters, and the return values that they send back to HyperTalk. —————————————————FINAL REMARKS——————————————————— I've enjoyed developing and using TEX tremendously. Please pass this stack along to your friends, please have fun with it, please send in your license fees, and please let me know if you enjoy it and if you make any discoveries or achieve any intellectual triumphs that TEX deserves credit for helping with. Read the “Author's Remarks”, and share your ideas with the world. ^z — 19880904 -- part contents for card part 107 ----- text ----- «click on this field to dismiss it» Some brief, impassioned, editorial remarks by the author: What you have in front of you is something precious. It represents years of my work, and includes ideas and algorithms from countless other people. A commercial product like this would cost hundreds or even thousands of dollars per copy. You get it for a very low fee — but with your acceptance comes an obligation: Take advantage of this stack! Use it to index and browse your collections of information. Discover new things; share those discoveries with the world; make your life and the lives of others better. And please let me know how TEX has aided you, and what new features you would like to see included in the system. If you're a software developer, take my algorithms and data structures and build upon them — add auxiliary indices or embedded tags to hold information about documents, zones, paragraphs, sentences, whatever! Extend the user interface that I've begun to implement. Figure out alternative ways to interact with the data — cross-correlate information, recognize natural language patterns, sort and search in different dimensions, browse multiple dataspaces, retrieve graphics, include annotations or hypertextual links, etc. Fold in ideas that you've seen elsewhere. Most importantly, leave hooks so that other developers can build upon your work. And please let me know about your extensions. I feel strongly that closed, proprietary index structures are a big mistake. Software tied to such an index is fundamentally weak — it can't be extended or adapted to new needs except by the permission of the owner, and it can't take full advantage or new ideas or technologies. My TEX system is completely open — there are no secrets! If you build upon it, both you and the users of your software will profit far more than if you try to make a short-term killing in the market. Excessive selfishness by software developers doesn't pay. Think of yourself as a scholar, a scientist — publish your work, share it with the world, and go into intellectual collaboration with your colleagues. Join the community of researchers. You'll get more accomplished, you'll help more people, and you won't have to feel guilty about stealing ideas ... you gain the right to use those ideas by sharing yours on an equal basis. Do something you can be proud of. Add your spark "...to the smouldering fire of Man's emergence from his savage past." (from a poem by Richard L. Ropiquet, ca. 1964) ^z — 19880904 -- part contents for card part 108 ----- text ----- graphical design by -andReas-> Vichr -- part contents for card part 110 ----- text ----- --------------Greetings, program!-------------------- Welcome to the technical documentation for TEX, short form! (It has to be in this ugly font, or it wouldn't be technical.) To get the full story of TEX, you really have to read my C code and comments thereto. See the Help field to find out how to order the source code and other programs, if you don't find what you need on a bulletin board, network, or from a users group. (In brief, send me $15 if you're not already a registered TEX individual/family user, $5 if you are, $5+$40*N if you are writing for a corporation using N copies of TEX simultaneously and haven't paid the license fee yet. And please read my Author's Remarks about sharing software and ideas!) This information is copyright © 1988 by Mark ^Zimmermann - all rights reserved. It is subject to change without notice, as future versions of TEX evolve. I am not responsible for errors or omissions herein; see the Help field for further legalistic disclaimers. This technical information field is full of various excerpts from the documentation/comments of some of my C programs. It may be of particular interest to those who would like to customize TEX or who want to take advantage of TEX's external functions for use in other stacks. If you want to work on TEX and improve it at the HyperTalk scripting level, begin by looking at all the scripts and the names and contents of the various invisible fields. Those fields correspond directly to the XFCN parameters (discussed below) which control how information is retrieved from a dataspace. Thus, if you want to change the amount of text fetched in a single chunk, you simply need put a new value into the field textChunkSize. If you want to make the index windows show a different number of lines, change the contents of field indexLines. To change the threshhold for skipping lines in the context view of a subspace, or to change the maximum sample number of words counted in an index subspace view before switching to a percentage estimate, alter the values in fields maxContextLinesSkipped or maxIndexSampleCount respectively. Many changes and customizations to TEX are quite easy to make. Be creative, and please let me know what variations you find most useful! Write me at the address given in the Help field (the same place that you send your license fee to - you have sent in your license fee, haven't you?), or send me electronic mail. Many thanks - your suggestions will help make the next version of TEX better for everybody. TEX uses four significant XFCNS: one to build indices, one to append or delete files, one to open and close dataspace files, and one very important XFCN which does all the real work of moving around and retrieving Index, Context, and Text views from dataspaces. In that order, they are: indexTEXfile(): /* * call the XFCN as 'indexTEXfile()' ... it returns with nothing * if all goes well, and an error message (if possible) otherwise... */ appendDeleteFiles(): /* * Call this XFCN as "appendDeleteFiles (cmd)". If cmd begins with the * letter 'A', then a standard files dialog will offer a list of TEXT * files; if one is selected, then the offer will be repeated and * subsequent selections will be appended to the end of the first * one. If cmd begins with the letter 'D', then files selected from * the std files dialog (which shows all files now, now just TEXT files) * will be irreversably (probably? barring low-level magic) deleted. */ openCloseTEXFiles(): /* * Call this XFCN as 'openCloseTEXfiles([optional list of numbers here])'. * * If the XFCN is called with no arguments, it puts up the standard files * dialog box and gives back in return the file refNums for the text * (document/dataspace) file, *.k, and *.p files, on one line, in * that order, separated by spaces. Note that the *.k and *.p files * must be in the same folder as the master dataspace file; an earlier * version of this XFCN tried to allow them to be elsewhere (or to * have different names), but it was too complex and didn't seem to be * appreciated by the users! * * If the XFCN is passed any numerical parameters, it attempts to * close them (assuming they are file refNums) and doesn't try to * open anything.... * * If an error while opening a file seems to occur, the XFCN attempts * to close all files that have already been opened, and returns with * an error msg instead of the three file refNums. No error checking * is done on closing files, since there's not much that one could * do about such an error in any case.... */ zbrowser(): /* the following parameter lists govern what zbrowser() does... * * ("CONTEXT", instanceNum, contextLines, targetContextLine, * contextLineLength, contextWordOffset, maxContextLinesSkipped, * ptrFileRefNum, textFileRefNum, subspaceHandle) * --returns with contextLines of display followed by contextLines * of instanceNum-textPtr pairs, with context instance instanceNum * on line targetContextLine... * * ("EMPTYSUBSPACE", subspaceHandle) * --returns quietly with nothing if it successfully sets all bits * in the subspace flag array to zero; beeps and gives an error msg * if it fails somehow... * * ("FILLSUBSPACE", subspaceHandle) * --returns quietly with nothing if it successfully sets all bits * in the subspace flag array to one; beeps and gives an error msg * if failure... * * ("INDEX", wordNum, indexLines, maxIndexSampleCount, indexCountWidth, * indexKeyWidth, keyFileRefNum, ptrFileRefNum, subspaceHandle) * --returns with indexLines of index window display, followed by * indexLines of instanceNums. The index lines are: * indexCountWidth columns of occurrence count info (right-justified), * a blank column, and indexKeyWidth columns of keyWord (in all * caps, left-justified). Demand that indexCountWidth be at least * 5, to allow for subindex count display, and that indexKeyWidth * be in the range 1 through KEY_LENGTH = 28 ... * * ("LOCATE", targetString, keyFileRefNum) * --returns wordNum for the targetString if it is found in the * key file; otherwise returns wordNum for the word alphabetically * preceding targetString followed by "{targetString not found!}" * on the second line of the answer... * * ("NEWSUBSPACE", textFileRefNum) * --returns subspaceHandle for a new subspace that it creates, big * enough to do subspace browsing -- but does NOT initialize that * subspace or check to see whether another subspace already * exists. Beeps and gives error msg if it fails... * * ("RELEASESUBSPACE", subspaceHandle) * --returns quietly with nothing if successful in releasing the * subspaceHandle, or noisily with an error message if it fails... * * ("SETSUBSPACEBITS", wordNum, neighborhoodSize, setOrClear, * keyFileRefNum, ptrFileRefNum, subspaceHandle) * --returns quietly with nothing if it is successful in setting or * clearing (depending on setOrClear's value, 0 or non-0) the * bits in the subspace flag array in the neighborhood of the * chosen word(s); gives an error msg if there was a problem. * neighborhoodSize is in characters and is used to determine * how many bits to set/clear on each side of the instances... * * ("TEXT", textPtr, textChunkSize, textOffset, textFileRefNum) * --returns with (if possible; see below) * textChunkSize bytes of text from the text file, * starting at byte number textPtr-textOffset+1 and ending * just before byte number textPtr-textOffset+textChunkSize+1. * (The '+1' is to match up with HyperCard's 1-based counting * convention, rather than the 0-based C convention!!) * If the file isn't big enough or if textPtr is too near the * beginning or end of the file, cut off the retrieved text * at that boundary and insert the words {beginning of dataspace} * or {end of dataspace}. ***Do no filtering of the text!*** * (Thus, there may be strangenesses if the 'text' file has * '\0' or other nasty characters in it -- sorry about that!) * Restrict textChunkSize to <32000 bytes. After the text, on * a separate line, return three numbers: the byte number of * the first char returned relative to the beginning of the text * file, the actual offset within the characters returned * of the originally-requested textPtr, and the byte number * of the character after the last char returned relative to * the beginning of the text file. */ - ^z - 19880904